Improving Precision and Recall for Soundex Retrieval

نویسندگان

  • David O. Holmes
  • M. Catherine McCabe
چکیده

We present a phonetic algorithm that fuses existing techniques and introduces new features. This combination offers improved precision and recall.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Thai-English Cross-Language Transliterated Word Retrieval using Soundex Technique

This paper presents an algorithm for Thai-English crosslanguage transliterated word retrieval. The algorithm enables retrieval of documents containing either the English keywords or the corresponding English-to-Thai transliterated words. This is done by retrieving documents based on phonetic codes of keywords rather than the keywords themselves. The phonetic coding is based on the Soundex codin...

متن کامل

Finding Content in File-Sharing Networks When You Can't Even Spell

The query success rate in current filesharing systems is low, for example, only 7-10% in Gnutella. An often-overlooked cause for this low recall is simply that keywords in queries and document descriptions are misspelled. Although many sophisticated approximate matching techniques have been developed by the Information Retrieval community, to our knowledge, they have not been used in popular P2...

متن کامل

Transliterated arabic name search

We address name search for transliterated Arabic given names. In previous work, we addressed similar problems with English and Arabic surnames. In each previous case, we used a variant of Soundex and n-grams to improve precision and recall of name matching compared against well known approaches such as the Russell Soundex algorithm. Unlike prior work, the proposed approach does not rely upon So...

متن کامل

Cross-language Phonetic Similarity Measure on Terms Appeared in Asian Languages

This study aims to develop a phonetic similarity measurement method across Asian languages. The method, cross-language similarity algorithm aggregates the transcription of language-specific Romanization, the International Phonetic Alphabet, the Soundex algorithm, and Levenshtein distance. To evaluate the proposed algorithm, this study involves an experiment using ninety-two chemical element nam...

متن کامل

Efficient Name Variation Detection

Semantic integration, link analysis and other forms of evidence detection often require recognition of multiple occurrences of a single name. However, names frequently occur in orthographic variations resulting from phonetic variations and transcription errors. The computational expense of similarity assessment algorithms usually precludes application to all pairs of strings. Instead, it is typ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002